NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ParGeo: A Library for Parallel Computational Geometry

https://doi.org/10.1145/3503221.3508429

Wang, Yiqiu; Yesantharao, Rahul; Yu, Shangdi; Dhulipala, Laxman; Gu, Yan; Shun, Julian (January 2022, 30th Annual European Symposium on Algorithms (ESA 2022))

Full Text Available
A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

https://doi.org/a

Wang, Yiqiu; Yu, Shangdi; Gu, Yan; Shun, Julian (July 2021, Proceedings of the International Symposium on Computational Geometry (SoCG))

Full Text Available
Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and Hierarchical Spatial Clustering

https://doi.org/10.1145/3448016.3457296

Wang, Yiqiu; Yu, Shangdi; Gu, Yan; Shun, Julian (June 2021, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD))

Full Text Available
GeoGraph: A Framework for Graph Processing on Geometric Data

https://doi.org/10.1145/3469379.3469384

Wang, Yiqiu; Yu, Shangdi; Dhulipala, Laxman; Gu, Yan; Shun, Julian (June 2021, ACM SIGOPS Operating Systems Review)

In many applications of graph processing, the input data is often generated from an underlying geometric point data set. However, existing high-performance graph processing frameworks assume that the input data is given as a graph. Therefore, to use these frameworks, the user must write or use external programs based on computational geometry algorithms to convert their point data set to a graph, which requires more programming effort and can also lead to performance degradation. In this paper, we present our ongoing work on the Geo- Graph framework for shared-memory multicore machines, which seamlessly supports routines for parallel geometric graph construction and parallel graph processing within the same environment. GeoGraph supports graph construction based on k-nearest neighbors, Delaunay triangulation, and b-skeleton graphs. It can then pass these generated graphs to over 25 graph algorithms. GeoGraph contains highperformance parallel primitives and algorithms implemented in C++, and includes a Python interface. We present four examples of using GeoGraph, and some experimental results showing good parallel speedups and improvements over the Higra library. We conclude with a vision of future directions for research in bridging graph and geometric data processing.
more » « less
Full Text Available
ParChain: a framework for parallel hierarchical agglomerative clustering using nearest-neighbor chain

https://doi.org/10.14778/3489496.3489509

Yu, Shangdi; Wang, Yiqiu; Gu, Yan; Dhulipala, Laxman; Shun, Julian (January 2021, Proceedings of the VLDB Endowment)

Full Text Available
Theoretically-Efficient and Practical Parallel DBSCAN

https://doi.org/10.1145/3318464.3380582

Wang, Yiqiu; Gu, Yan; Shun, Julian (May 2020, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD))
null (Ed.)
Full Text Available
Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

https://doi.org/10.1145/3183713.3196925

Wang, Yiqiu; Shrivastava, Anshumali; Wang, Jonathan; Ryu, Junghee (January 2018, Proceedings of the 2018 International Conference on Management of Data)

We present FLASH (F ast L SH A lgorithm for S imilarity search accelerated with H PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results.
more » « less
Full Text Available

Search for: All records